4 research outputs found
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
We present REMARK-LLM, a novel efficient, and robust watermarking framework
designed for texts generated by large language models (LLMs). Synthesizing
human-like content using LLMs necessitates vast computational resources and
extensive datasets, encapsulating critical intellectual property (IP). However,
the generated content is prone to malicious exploitation, including spamming
and plagiarism. To address the challenges, REMARK-LLM proposes three new
components: (i) a learning-based message encoding module to infuse binary
signatures into LLM-generated texts; (ii) a reparameterization module to
transform the dense distributions from the message encoding to the sparse
distribution of the watermarked textual tokens; (iii) a decoding module
dedicated for signature extraction; Furthermore, we introduce an optimized beam
search algorithm to guarantee the coherence and consistency of the generated
content. REMARK-LLM is rigorously trained to encourage the preservation of
semantic integrity in watermarked content, while ensuring effective watermark
retrieval. Extensive evaluations on multiple unseen datasets highlight
REMARK-LLM proficiency and transferability in inserting 2 times more signature
bits into the same texts when compared to prior art, all while maintaining
semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against
a spectrum of watermark detection and removal attacks
Recommended from our members
Robust and Efficient Deep Learning for Multimedia Generation and Recognition
Deep Neural Networks (DNNs) have transformed the field of multimedia generation and recognition by replacing traditional hand-engineered systems in domains like vision, speech and text. This is because DNNs can operate end-to-end and model complex dependencies yielding state-of-the-art results on several generation and recognition benchmarks. However, there are three key challenges that need to be addressed for the practical, secure and reliable deployment of DNN-based media processing systems: 1) Robustness: DNNs are vulnerable to adversarial attacks, 2) Data-Requirement: DNNs often require large amounts of labelled data, 3) Compute-Efficiency: DNNs require extensive compute and resources.My research focuses on addressing the above three challenges of DNN based multimedia generation and recognition systems. On the robustness side, I first analyze practical vulnerabilities of DNN-based recognition systems and then propose a robust defense framework that can reliably identify adversarial inputs using perceptually informed input transformations. To address the challenge of data-requirement, I develop training frameworks that can effectively adapt foundation models trained using self-supervised learning for recognition and synthesis tasks in a data-efficient manner. Finally, to address the challenge of compute-efficiency, I propose acceleration methods using hardware-software codesign that significantly reduce the latency and resource-requirement while preserving the synthesis quality of DNN generators